feat(sync-service): cap logger handler OLP mailboxes#4456
feat(sync-service): cap logger handler OLP mailboxes#4456erik-the-implementer wants to merge 3 commits into
Conversation
Cap the overload-protection (OLP) mailboxes of the default console, OpenTelemetry, and Sentry logger handlers so error/log bursts shed messages instead of blocking Logger callers or growing unbounded. - Default console handler: sync_mode_qlen/drop_mode_qlen 2000, flush_qlen 5000 (matches the stratovolt cloud-sync-service precedent). - OtelMetricExporter.LogHandler: drop_mode_qlen 2000, flush_qlen 5000. sync_mode_qlen is intentionally omitted — the handler forces sync_mode_qlen == drop_mode_qlen, so it would be a no-op there. - Sentry handler: discard_threshold 2000, sync_threshold nil (matches the stratovolt precedent; the OSS add_logger_handler/2 already accepts opts). overload_kill_enable is deliberately NOT set on either OLP handler: it would terminate the handler (console) or tear down LogHandlerSupervisor (OTel), creating a restart-amplification loop. This is leading-edge protection only; it is not sufficient under deep scheduler starvation. The real fix for that is upstream (request-proxy admission control, snapshot-pool sizing). See electric-sql/alco-agent-tasks#45 §3.
|
Claude Code ReviewSummarySecond pass on the OLP-cap PR. The author posted a detailed rebuttal to the only "Important" item from iteration 1, and pushed two cleanup commits trimming the cross-repo What's Working Well
Issues FoundCritical (Must Fix)None. Important (Should Fix)None. Suggestions (Nice to Have)The remaining iteration-1 suggestions are reasonably deferred by the author and I am not re-raising them:
Issue ConformanceImplementation still matches the PR description: three sites, omitted Previous Review Status
LGTM from my side once CI is green. Review iteration: 2 | 2026-06-02 |
|
Thanks for the thorough review. Disposition of the items below. Important — "set
|
What
Cap the overload-protection (OLP) mailboxes of the three logger handlers used by
sync-serviceso that error/log bursts shed messages instead of blockingLoggercallers or letting processes' mailboxes grow unbounded.Three sites, all inside
packages/sync-service/:config/runtime.exs) —sync_mode_qlen: 2000, drop_mode_qlen: 2000, flush_qlen: 5000. Matches the stratovoltcloud-sync-serviceprecedent.OtelMetricExporter.LogHandler(config/runtime.exs) — extends the existing:configmap withdrop_mode_qlen: 2000, flush_qlen: 5000.sync_mode_qlenis intentionally not set: the handler module forcessync_mode_qlen == drop_mode_qlen, so it would be a no-op. This handler had no caps anywhere before.lib/electric/application.ex) — passesdiscard_threshold: 2000, sync_threshold: niltoElectric.Telemetry.Sentry.add_logger_handler/2(the function already accepts opts; only the call site changed). Matches the stratovolt precedent.Why
During the 2026-05-21 pod-replacement spikes,
logger_olpmailboxes peaked at ~76K/~59K. Capping these mailboxes protects the leading edge of redeployment-shock overload.overload_kill_enableis deliberately not set on either OLP handler: for the console handler it would terminate the handler and lose all subsequent logs until restart; forOtelMetricExporter.LogHandlerit would tear down the entireLogHandlerSupervisor, creating a restart-amplification loop.